113 research outputs found
Coordinated inductive learning using argumentation-based communication
This paper focuses on coordinated inductive learning, concerning how agents with inductive learning capabilities can coordinate their learnt hypotheses with other agents. Coordination in this context means that the hypothesis learnt by one agent is consistent with the data known to the other agents. In order to address this problem, we present A-MAIL, an argumentation approach for agents to argue about hypotheses learnt by induction. A-MAIL integrates, in a single framework, the capabilities of learning from experience, communication, hypothesis revision and argumentation. Therefore, the A-MAIL approach is one step further in achieving autonomous agents with learning capabilities which can use, communicate and reason about the knowledge they learn from examples. © 2014, The Author(s).Research partially funded by the projects Next-CBR (TIN2009-13692-C03-01) and Cognitio (TIN2012-38450- C03-03) [both co-funded with FEDER], Agreement Technologies (CONSOLIDER CSD2007-0022), and by the Grants 2009-SGR-1433 and 2009-SGR-1434 of the Generalitat de Catalunya.Peer reviewe
Similarity measures over refinement graphs
Similarity also plays a crucial role in support vector machines. Similarity assessment plays a key role in lazy learning methods such as k-nearest neighbor or case-based reasoning. In this paper we will show how refinement graphs, that were originally introduced for inductive learning, can be employed to assess and reason about similarity. We will define and analyze two similarity measures, S λ and S π, based on refinement graphs. The anti-unification-based similarity, S λ, assesses similarity by finding the anti-unification of two instances, which is a description capturing all the information common to these two instances. The property-based similarity, S π, is based on a process of disintegrating the instances into a set of properties, and then analyzing these property sets. Moreover these similarity measures are applicable to any representation language for which a refinement graph that satisfies the requirements we identify can be defined. Specifically, we present a refinement graph for feature terms, in which several languages of increasing expressiveness can be defined. The similarity measures are empirically evaluated on relational data sets belonging to languages of different expressiveness. © 2011 The Author(s).Support for this work came from the project Next-CBR TIN2009-13692-C03-01 (co-sponsored by EU FEDER funds)Peer Reviewe
The Personalization Paradox: the Conflict between Accurate User Models and Personalized Adaptive Systems
Personalized adaptation technology has been adopted in a wide range of
digital applications such as health, training and education, e-commerce and
entertainment. Personalization systems typically build a user model, aiming to
characterize the user at hand, and then use this model to personalize the
interaction. Personalization and user modeling, however, are often
intrinsically at odds with each other (a fact some times referred to as the
personalization paradox). In this paper, we take a closer look at this
personalization paradox, and identify two ways in which it might manifest:
feedback loops and moving targets. To illustrate these issues, we report
results in the domain of personalized exergames (videogames for physical
exercise), and describe our early steps to address some of the issues arisen by
the personalization paradox.Comment: arXiv admin note: substantial text overlap with arXiv:2101.1002
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved
state-of-the-art performance in many challenging strategy games. Because these
games have complicated rules, an action sampled from the full discrete action
space will typically be invalid. The usual approach to deal with this problem
in policy gradient algorithms is to "mask out" invalid actions and just sample
from the set of valid actions. The implications of this process, however,
remain under-investigated. In this paper, we show that the standard working
mechanism of invalid action masking corresponds to valid policy gradient
updates. More interestingly, it works by applying a state-dependent
differentiable function during the calculation of action probability
distribution. Additionally, we show its critical importance to the performance
of policy gradient algorithms. Specifically, our experiments show that invalid
action masking scales well when the space of invalid actions is large, while
the common approach of giving negative rewards for invalid actions will fail.
Finally, we provide further insights by evaluating different action masking
regimes, such as removing masking after an agent has been trained using
masking.Comment: Preprint. Corrected a major issue of the withdrawn version submitted
to NeurIPS 202
- …